112 research outputs found

    Whole-Body Motion Capture and Beyond: From Model-Based Inference to Learning-Based Regression

    Get PDF
    Herkömmliche markerlose Motion Capture (MoCap)-Methoden sind zwar effektiv und erfolgreich, haben aber mehrere Einschränkungen: 1) Sie setzen ein charakterspezifi-sches Körpermodell voraus und erlauben daher keine vollautomatische Pipeline und keine Verallgemeinerung über verschiedene Korperformen; 2) es werden keine Objekte verfolgt, mit denen Menschen interagieren, während in der Realität die Interaktion zwischen Menschen und Objekten allgegenwärtig ist; 3) sie sind in hohem Maße von ausgeklügelten Optimierungen abhängig, die eine gute Initialisierung und starke Prioritäten erfordern. Dieser Prozess kann sehr zeitaufwändig sein. In dieser Arbeit befassen wir uns mit allen oben genannten Problemen. Zunächst schlagen wir eine vollautomatische Methode zur genauen 3D-Rekonstruktion des menschlichen Körpers aus RGB-Videos mit mehreren Ansichten vor. Wir verarbeiten alle RGB-Videos vor, um 2D-Keypoints und Silhouetten zu erhalten. Dann passen wir modell in zwei aufeinander folgenden Schritten an die 2D-Messungen an. In der ersten Phase werden die Formparameter und die Posenparameter der SMPL nacheinander und bildweise geschtäzt. In der zweiten Phase wird eine Reihe von Einzelbildern gemeinsam mit der zusätzlichen DCT-Priorisierung (Discrete Cosine Transformation) verfeinert. Unsere Methode kann verschiedene Körperformen und schwierige Posen ohne menschliches Zutun verarbeiten. Dann erweitern wir das MoCap-System, um die Verfolgung von starren Objekten zu unterstutzen, mit denen die Testpersonen interagieren. Unser System besteht aus 6 RGB-D Azure-Kameras. Zunächst werden alle RGB-D Videos vorverarbeitet, indem Menschen und Objekte segmentiert und 2D-Körpergelenke erkannt werden. Das SMPL-X Modell wird hier eingesetzt, um die Handhaltung besser zu erfassen. Das SMPL-XModell wird in 2D-Keypoints und akkumulierte Punktwolken eingepasst. Wir zeigen, dass die Körperhaltung wichtige Informationen für eine bessere Objektverfolgung liefert. Anschließend werden die Körper- und Objektposen gemeinsam mit Kontakt- und Durch-dringungsbeschrankungen optimiert. Mit diesem Ansatz haben wir den ersten Mensch-Objekt-Interaktionsdatensatz mit natürlichen RGB-Bildern und angemessenen Körper und Objektbewegungsinformationen erfasst. Schließlich präsentieren wir das erste praktische, leichtgewichtige MoCap-System, das nur 6 Inertialmesseinheiten (IMUs) benötigt. Unser Ansatz basiert auf bi-direktionalen rekurrenten neuronalen Netzen (Bi-RNN). Das Netzwerk soll die zeitliche Abhängigkeit besser ausnutzen, indem es vergangene und zukünftige Teilmessungen der IMUs zu- sammenfasst. Um das Problem der Datenknappheit zu lösen, erstellen wir synthetische Daten aus archivierten MoCap-Daten. Insgesamt läuft unser System 10 Mal schneller als die Optimierungsmethode und ist numerisch genauer. Wir zeigen auch, dass es möglich ist, die Aktivität der Testperson abzuschätzen, indem nur die IMU Messung der Smart-watch, die die Testperson trägt, betrachtet wird. Zusammenfassend lässt sich sagen, dass wir die markerlose MoCap-Methode weiter-entwickelt haben, indem wir das erste automatische und dennoch genaue System beisteuerten, die MoCap-Methoden zur Unterstützung der Verfolgung starrer Objekte erweiterten und einen praktischen und leichtgewichtigen Algorithmus mit 6 IMUs vorschlugen. Wir glauben, dass unsere Arbeit die markerlose MoCap billiger und praktikabler macht und somit den Endnutzern fur den taglichen Gebrauch näher bringt.Though effective and successful, traditional marker-less Motion Capture (MoCap) methods suffer from several limitations: 1) they presume a character-specific body model, thus they do not permit a fully automatic pipeline and generalization over diverse body shapes; 2) no objects humans interact with are tracked, while in reality interaction between humans and objects is ubiquitous; 3) they heavily rely on a sophisticated optimization process, which needs a good initialization and strong priors. This process can be slow. We address all the aforementioned issues in this thesis, as described below. Firstly we propose a fully automatic method to accurately reconstruct a 3D human body from multi-view RGB videos, the typical setup for MoCap systems. We pre-process all RGB videos to obtain 2D keypoints and silhouettes. Then we fit the SMPL body model into the 2D measurements in two successive stages. In the first stage, the shape and pose parameters of SMPL are estimated frame-wise sequentially. In the second stage, a batch of frames are refined jointly with an extra DCT prior. Our method can naturally handle different body shapes and challenging poses without human intervention. Then we extend this system to support tracking of rigid objects the subjects interact with. Our setup consists of 6 Azure Kinect cameras. Firstly we pre-process all the videos by segmenting humans and objects and detecting 2D body joints. We adopt the SMPL-X model here to capture body and hand pose. The model is fitted to 2D keypoints and point clouds. Then the body poses and object poses are jointly updated with contact and interpenetration constraints. With this approach, we capture a novel human-object interaction dataset with natural RGB images and plausible body and object motion information. Lastly, we present the first practical and lightweight MoCap system that needs only 6 IMUs. Our approach is based on Bi-directional RNNs. The network can make use of temporal information by jointly reasoning about past and future IMU measurements. To handle the data scarcity issue, we create synthetic data from archival MoCap data. Overall, our system runs ten times faster than traditional optimization-based methods, and is numerically more accurate. We also show it is feasible to estimate which activity the subject is doing by only observing the IMU measurement from a smartwatch worn by the subject. This not only can be useful for a high-level semantic understanding of the human behavior, but also alarms the public of potential privacy concerns. In summary, we advance marker-less MoCap by contributing the first automatic yet accurate system, extending the MoCap methods to support rigid object tracking, and proposing a practical and lightweight algorithm via 6 IMUs. We believe our work makes marker-less and IMUs-based MoCap cheaper and more practical, thus closer to end-users for daily usage

    Real-time moving object classification with automatic scene division

    Get PDF
    ABSTRACT We address the problem of moving object classification. Our aim is to classify moving objects of traffic scene videos into pedestrians, bicycles and vehicles. Instead of supervised learning and manual labeling of large training samples, our classifiers are initialized and refined online automatically. With efficient features extracted and organized, the approach can be real-time and achieve high classification accuracy. Once the view or scene changes detected, the algorithm can automatically refine the classifiers and adapt them to new environments. Experimental results demonstrate the effectiveness and robustness of the proposed approach

    MusiLingo: Bridging Music and Text with Pre-trained Language Models for Music Captioning and Query Response

    Full text link
    Large Language Models (LLMs) have shown immense potential in multimodal applications, yet the convergence of textual and musical domains remains relatively unexplored. To address this gap, we present MusiLingo, a novel system for music caption generation and music-related query responses. MusiLingo employs a single projection layer to align music representations from the pre-trained frozen music audio model MERT with the frozen LLaMA language model, bridging the gap between music audio and textual contexts. We train it on an extensive music caption dataset and fine-tune it with instructional data. Due to the scarcity of high-quality music Q&A datasets, we created the MusicInstruct (MI) dataset from MusicCaps, tailored for open-ended music inquiries. Empirical evaluations demonstrate its competitive performance in generating music captions and composing music-related Q&A pairs. Our introduced dataset enables notable advancements beyond previous ones

    MindLLM: Pre-training Lightweight Large Language Model from Scratch, Evaluations and Domain Applications

    Full text link
    Large Language Models (LLMs) have demonstrated remarkable performance across various natural language tasks, marking significant strides towards general artificial intelligence. While general artificial intelligence is leveraged by developing increasingly large-scale models, there could be another branch to develop lightweight custom models that better serve certain domains, taking into account the high cost of training and deploying LLMs and the scarcity of resources. In this paper, we present MindLLM, a novel series of bilingual lightweight large language models, trained from scratch, alleviating such burdens by offering models with 1.3 billion and 3 billion parameters. A thorough account of experiences accrued during large model development is given, covering every step of the process, including data construction, model architecture, evaluation, and applications. Such insights are hopefully valuable for fellow academics and developers. MindLLM consistently matches or surpasses the performance of other open-source larger models on some public benchmarks. We also introduce an innovative instruction tuning framework tailored for smaller models to enhance their capabilities efficiently. Moreover, we explore the application of MindLLM in specific vertical domains such as law and finance, underscoring the agility and adaptability of our lightweight models.Comment: Working in progres

    Experimental Study on Stress and Strain Characteristics of Solidified Clay under Seawater Condition

    Get PDF
    This paper presents the results of a laboratory study on the stress-strain relationship of solidified clay formed in seawater corrosion condition. An automatic triaxial apparatus was used and the axial stress and strain was monitored continuously. The dry density was 1.0g/cm3, the cement contents were 4, 6, 8 and 10% by weight of dry soil particles, and the curing time was 28, 60 and 90 days respectively. Test results indicate that the stress strain relationship of cemented clay was affected by soil density, cement content and curing period. A behaviour of strain hardening to strain softening occurred with the increase of cement content. Strong structure will form in cemented clay when the admixture content is 10% or more. The increase in strength of the solidified foundation is resulted from the increase in internal friction angle and cohesive force. The cohesive force increases obviously with the increase of the cement content and the curing age, but the change of internal friction angle is not pronounced after reaching a certain value

    Phosphorus recovery from anaerobically digested liquor of screenings

    Get PDF
    Phosphorus is a limited resource which is predicted to get exhausted at some point during the twenty-first century. However, it is present in wastewaters at concentrations that come close to supplying the nation’s annual requirements for fertiliser. Many papers have addressed the recovery of phosphorus as struvite (magnesium ammonium phosphate hexahydrate) from different types of waste while the most prominent usage of struvite is as a slow-release fertiliser, suitable as a replacement for chemical fertiliser, for agricultural application. In this study, screenings produced during the wastewater treatment process were anaerobically digested to obtain anaerobically digested liquor which was subsequently used for phosphorus recovery in the form of struvite. This was carried out at different concentrations of dry solids. The amount of struvite potential was calculated theoretically using molar ratio calculations of 1:1:1 (Mg:N:P). From the results, it was found that the digestate is high in phosphorus content and can be recovered up to 41%. For struvite yield, 0.27,kg of struvite can be recovered from each kg dry solids of screenings from 3% of dry solids. Screenings thus prove a valuable source of additional phosphorus which current disposal practices fail to exploit

    MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training

    Full text link
    Self-supervised learning (SSL) has recently emerged as a promising paradigm for training generalisable models on large-scale data in the fields of vision, text, and speech. Although SSL has been proven effective in speech and audio, its application to music audio has yet to be thoroughly explored. This is primarily due to the distinctive challenges associated with modelling musical knowledge, particularly its tonal and pitched characteristics of music. To address this research gap, we propose an acoustic Music undERstanding model with large-scale self-supervised Training (MERT), which incorporates teacher models to provide pseudo labels in the masked language modelling (MLM) style acoustic pre-training. In our exploration, we identified a superior combination of teacher models, which outperforms conventional speech and audio approaches in terms of performance. This combination includes an acoustic teacher based on Residual Vector Quantization - Variational AutoEncoder (RVQ-VAE) and a musical teacher based on the Constant-Q Transform (CQT). These teachers effectively guide our student model, a BERT-style transformer encoder, to better model music audio. In addition, we introduce an in-batch noise mixture augmentation to enhance the representation robustness. Furthermore, we explore a wide range of settings to overcome the instability in acoustic language model pre-training, which allows our designed paradigm to scale from 95M to 330M parameters. Experimental results indicate that our model can generalise and perform well on 14 music understanding tasks and attains state-of-the-art (SOTA) overall scores. The code and models are online: https://github.com/yizhilll/MERT

    The Natural Compound Myricetin Effectively Represses the Malignant Progression of Prostate Cancer by Inhibiting PIM1 and Disrupting the PIM1/CXCR4 Interaction

    Get PDF
    Background/Aims: Natural compounds are a promising resource for anti-tumor drugs. Myricetin, an abundant flavonoid found in the bark and leaves of bayberry, shows multiple promising anti-tumor functions in various cancers. Methods: The cytotoxic, pro-apoptotic, and anti-metastatic effects of myricetin on prostate cancer cells were investigated in both in vitro and in vivo studies. Short-hairpin RNA knockdown of the proviral integration site for Moloney murine leukemia virus-1 (PIM1), pull-down and co-immunoprecipitation assays, and an intracellular Ca2+ flux assay were used to investigate the potential underlying mechanism of myricetin. ONCOMINE database data mining and immunohistochemical analysis of prostate cancer tissues were used to evaluate the expression of PIM1 and CXCR4, as well as the correlation between PIM1 and CXCR4 expression and the clinicopathologic characteristics and prognoses of prostate cancer patients. Results: Myricetin exerted selective cytotoxic, pro-apoptotic, and anti-metastatic effects on prostate cancer cells by inhibiting PIM1 and disrupting the PIM1/CXCR4 interaction. Moreover, PIM1 and CXCR4 were coexpressed and associated with aggressive clinicopathologic traits and poor prognosis in prostate cancer patients. Conclusion: These results offer preclinical evidence for myricetin as a potential chemopreventive and therapeutic agent for precision medicine tailored to prostate cancer patients characterized by concomitant elevated expression of PIM1 and CXCR4
    corecore